Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Compatible lag function in spark-3.0 #7168

Closed
wants to merge 2 commits into from

Conversation

kecookier
Copy link
Contributor

@kecookier kecookier commented Sep 9, 2024

What changes were proposed in this pull request?

This PR addresses the lag result mismatch issue in my current version of Spark. Thank you for your input. I have reviewed the Spark code, and now I understand the differences.

In Spark 3.0, the LAG function calculates the bound using both the offset and the direction. However, in versions post Spark 3.1, the function does not consider the direction. Instead, it uses a single literal expression that includes the offset of the current row to calculate the bound. For example, in lag(), the offset will be wrapped with UnaryMinus(offset).

lag in spark 3.0 https://github.com/apache/spark/blob/branch-3.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L474

lag in spark 3.2 https://github.com/apache/spark/blob/branch-3.2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L538

How was this patch tested?

Exist UT.

@github-actions github-actions bot added the CORE works for Gluten Core label Sep 9, 2024
Copy link

github-actions bot commented Sep 9, 2024

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

github-actions bot commented Sep 9, 2024

Run Gluten Clickhouse CI

@kecookier kecookier requested a review from PHILO-HE September 9, 2024 10:05
@kecookier
Copy link
Contributor Author

@PHILO-HE Can you help review this PR?

@github-actions github-actions bot added the VELOX label Sep 9, 2024
Copy link

github-actions bot commented Sep 9, 2024

Run Gluten Clickhouse CI

@@ -31,7 +31,7 @@ object WindowFunctionsBuilder {
val substraitFunc = windowFunc match {
// Handle lag with negative inputOffset, e.g., converts lag(c1, -1) to lead(c1, 1).
// Spark uses `-inputOffset` as `offset` for Lag function.
case lag: Lag if lag.offset.eval(EmptyRow).asInstanceOf[Int] > 0 =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kecookier, seems offset > 0 implies inputOffset < 0, so the original handling is expected, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PHILO-HE This PR addresses the lag result mismatch issue in my current version of Spark. Thank you for your input. I have reviewed the Spark code, and now I understand the differences.

In Spark 3.0, the LAG function calculates the bound using both the offset and the direction. However, in versions post Spark 3.1, the function does not consider the direction. Instead, it uses a single literal expression that includes the offset of the current row to calculate the bound. For example, in lag(), the offset will be wrapped with UnaryMinus(offset).

lag in spark 3.0 https://github.com/apache/spark/blob/branch-3.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L474

lag in spark 3.2 https://github.com/apache/spark/blob/branch-3.2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L538

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kecookier, I see. Thanks for your clarification! Seems there is no way to let gluten adapt to the two implementations. Should you fix it in a forked gluten on your side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will close this PR.

@kecookier
Copy link
Contributor Author

I will make it compatible with Spark 3.0 in the internal repo.

@kecookier kecookier closed this Sep 10, 2024
@kecookier kecookier changed the title [VL][MINOR] Fix typo error when convert lag() to lead() [VL]Compatible lag function in spark-3.0 Sep 10, 2024
@kecookier kecookier changed the title [VL]Compatible lag function in spark-3.0 [VL] Compatible lag function in spark-3.0 Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CORE works for Gluten Core VELOX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants